📁 Latest file found: retail_sugar_prices_2025-04-16.csv
date admin1 admin2 market market_id latitude longitude category commodity commodity_id unit priceflag pricetype currency price usdprice
0 1994-01-15 Gujarat Ahmadabad Ahmedabad 923 23.03 72.62 miscellaneous food sugar 97 KG actual retail INR 13.5 0.43
1 1994-01-15 Karnataka Bangalore Urban Bengaluru 926 12.96 77.58 miscellaneous food sugar 97 KG actual retail INR 13.2 0.42
2 1994-01-15 Maharashtra Mumbai city Mumbai 955 18.98 72.83 miscellaneous food sugar 97 KG actual retail INR 13.8 0.44
3 1994-01-15 Orissa Khordha Bhubaneshwar 929 20.23 85.83 miscellaneous food sugar 97 KG actual retail INR 13.5 0.43
4 1994-01-15 Tripura West Tripura Agartala 921 23.84 91.28 miscellaneous food sugar 97 KG actual retail INR 16.0 0.51
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

📊 Insights from EDA of Retail Sugar Prices (1994–2025)¶


🧭 1. Overall Trends (Long-Term Behavior)¶

  • Sugar prices in India have steadily increased over the last 30 years.
  • From around ₹8/kg in 1994, prices have climbed to over ₹45/kg by recent years.
  • A strong upward trend is clearly visible in the trend decomposition plot, especially during:
    • 2009–2011: Noticeable inflationary spike in sugar prices.
    • 2020–2022: Gradual increase likely influenced by pandemic-era disruptions.

📈 2. Monthly Average & Seasonality¶

  • There is seasonal fluctuation in sugar prices, repeating every year.
  • Prices tend to drop in early months (April–May) and rise again toward the year-end.
  • This pattern suggests:
    • A possible link to agricultural harvest cycles or festive demand variations.
    • E.g., Diwali or end-of-year celebrations may create demand surges.

🧮 3. Monthly Percentage Change¶

  • Most months show mild percentage changes (<5%), but a few months spike or drop sharply:
    • These may correspond to policy changes, import/export controls, or supply shocks.
    • Sudden changes often align with known economic events or weather-related impacts.

📦 4. Distribution Insights¶

  • The most common sugar price historically was ₹8, which occurred 21 times — mostly in early years.
  • Overall, sugar prices follow a right-skewed distribution with a concentration of values between ₹20–₹40/kg.
  • This skew indicates gradual but consistent inflation in consumer sugar pricing.

📅 5. Year-wise Variability¶

  • The boxplot shows wider price ranges in later years (post-2010), suggesting:
    • Increased volatility in the market.
    • Possibly driven by global market influences, fuel costs, or climate-driven variability.

🌡️ 6. Heatmap View¶

  • The heatmap confirms a repeating seasonal cycle:
    • Sugar prices are lower in mid-year months and higher towards the end/start of each year.
    • 2009, 2016, and 2020 stand out with unusual spikes, possibly due to external shocks.

C:\Users\neeti\AppData\Local\Temp\ipykernel_13712\757177853.py:28: UserWarning: Glyph 128202 (\N{BAR CHART}) missing from current font.
  plt.tight_layout()
C:\Users\neeti\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 128202 (\N{BAR CHART}) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
No description has been provided for this image
Top states by data completeness:
              admin1  months_of_data
0       Maharashtra             257
1        Tamil Nadu             251
2         Rajasthan             241
3            Orissa             240
4  Himachal Pradesh             231
5     Uttar Pradesh             230
6         Karnataka             223
7    Madhya Pradesh             219
8            Kerala             217
9             Bihar             215

Bottom states by data completeness:
                  admin1  months_of_data
21          Uttarakhand             112
22       Andhra Pradesh             111
23  Andaman and Nicobar              93
24           Chandigarh              92
25             Nagaland              92
26           Puducherry              79
27                  Goa              72
28         Chhattisgarh              34
29               Sikkim              21
30              Manipur              13
C:\Users\neeti\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
  super()._check_params_vs_input(X, default_n_init=10)
C:\Users\neeti\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1440: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.
  warnings.warn(
                  state  cluster
15          Maharashtra        0
28        Uttar Pradesh        0
26            Telangana        0
25           Tamil Nadu        0
23            Rajasthan        0
20               Orissa        0
14       Madhya Pradesh        0
13               Kerala        0
12            Karnataka        0
11            Jharkhand        0
30          West Bengal        0
8               Gujarat        0
6                 Delhi        0
2                 Assam        0
3                 Bihar        0
4            Chandigarh        0
10     Himachal Pradesh        0
0   Andaman and Nicobar        1
16              Manipur        1
24               Sikkim        1
1        Andhra Pradesh        2
22               Punjab        2
9               Haryana        2
29          Uttarakhand        2
7                   Goa        2
21           Puducherry        2
5          Chhattisgarh        2
19             Nagaland        3
18              Mizoram        3
17            Meghalaya        3
27              Tripura        3
C:\Users\neeti\AppData\Local\Temp\ipykernel_13712\1823660468.py:49: UserWarning: Glyph 128205 (\N{ROUND PUSHPIN}) missing from current font.
  plt.tight_layout()
C:\Users\neeti\anaconda3\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 128205 (\N{ROUND PUSHPIN}) missing from current font.
  fig.canvas.print_figure(bytes_io, **kw)
No description has been provided for this image

🧠 Cluster Interpretations¶


✅ Cluster 0 – Majority Group (Stable/Aligned Trend)¶

States: Maharashtra, Uttar Pradesh, Tamil Nadu, Rajasthan, Karnataka, Kerala, Madhya Pradesh, Gujarat, etc.

  • These states show similar seasonal patterns and consistent data availability.
  • Likely follow national sugar price trends.
  • Suitable for country-level modeling or selecting a representative sample group.

🌴 Cluster 1 – Sparse/Irregular or Island Territories¶

States: Andaman & Nicobar, Manipur, Sikkim

  • Often have sparse data or irregular price patterns.
  • May have unique supply chains (e.g., non-agricultural or import-dependent).
  • Not ideal for standard trend modeling without adjustment.

⚡ Cluster 2 – Semi-distinct / Volatile Trends¶

States: Andhra Pradesh, Punjab, Haryana, Goa, Puducherry, Chhattisgarh

  • Show greater price volatility or regional fluctuations.
  • Could be influenced by local governance, infrastructure, or supply-demand issues.
  • May need separate forecasting models or volatility handling.

🏔️ Cluster 3 – Northeast Focused Outliers¶

States: Nagaland, Mizoram, Meghalaya, Tripura

  • Exhibit distinctive pricing behavior compared to mainland states.
  • Influenced by transport costs, geography, or border trade policies.
  • Important to treat as a unique regional group in analysis.

Requirement already satisfied: plotly in c:\users\neeti\anaconda3\lib\site-packages (5.22.0)
Requirement already satisfied: geopandas in c:\users\neeti\anaconda3\lib\site-packages (1.0.1)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\neeti\anaconda3\lib\site-packages (from plotly) (8.2.2)
Requirement already satisfied: packaging in c:\users\neeti\anaconda3\lib\site-packages (from plotly) (23.2)
Requirement already satisfied: numpy>=1.22 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (1.26.4)
Requirement already satisfied: pyogrio>=0.7.2 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (0.10.0)
Requirement already satisfied: pandas>=1.4.0 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (2.2.2)
Requirement already satisfied: pyproj>=3.3.0 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (3.7.1)
Requirement already satisfied: shapely>=2.0.0 in c:\users\neeti\anaconda3\lib\site-packages (from geopandas) (2.1.0)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\neeti\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\neeti\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\neeti\anaconda3\lib\site-packages (from pandas>=1.4.0->geopandas) (2023.3)
Requirement already satisfied: certifi in c:\users\neeti\anaconda3\lib\site-packages (from pyogrio>=0.7.2->geopandas) (2024.7.4)
Requirement already satisfied: six>=1.5 in c:\users\neeti\anaconda3\lib\site-packages (from python-dateutil>=2.8.2->pandas>=1.4.0->geopandas) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]
Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]
Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]
Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]
✅ EDA report saved to: C:\Users\neeti\Documents\ISB_Class of Summer_2025\04 Term 4\Foundation\Foundation-Project_Group-14\notebooks\eda_report.html
🚀 EDA notebook committed and pushed to GitHub.